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Restricted Boltzmann Machines (RBMs) and Deep Belief Networks have been 
demonstrated to perform efficiently in a variety of applications, such as dimensionality 
reduction, feature learning, and classification. Their implementation on neuromorphic 
hardware platforms emulating large-scale networks of spiking neurons can have significant 
advantages from the perspectives of scalability, power dissipation and real-time interfacing 
with the environment. However, the traditional RBM architecture and the commonly used 
training algorithm known as Contrastive Divergence (CD) are based on discrete updates 
and exact arithmetics which do not directly map onto a dynamical neural substrate. Here, 
we present an event-driven variation of CD to train a RBM constructed with Integrate 
& Fire (l&F) neurons, that is constrained by the limitations of existing and near future 
neuromorphic hardware platforms. Our strategy is based on neural sampling, which 
allows us to synthesize a spiking neural network that samples from a target Boltzmann 
distribution. The recurrent activity of the network replaces the discrete steps of the CD 
algorithm, while Spike Time Dependent Plasticity (STDP) carries out the weight updates 
in an online, asynchronous fashion. We demonstrate our approach by training an RBM 
composed of leaky l&F neurons with STDP synapses to learn a generative model of 
the MNIST hand-written digit dataset, and by testing it in recognition, generation and 
cue integration tasks. Our results contribute to a machine learning-driven approach for 
synthesizing networks of spiking neurons capable of carrying out practical, high-level 
functionality. 

Keywords: synaptic plasticity, neuromorphic cognition, Markov chain monte carlo, recurrent neural network, 
generative model 



1. INTRODUCTION 

Machine learning algorithms based on stochastic neural network 
models such as RBMs and deep networks are currently the state- 
of-the-art in several practical tasks (Hinton and Salakhutdinov, 
2006; Bengio, 2009). The training of these models requires sig- 
nificant computational resources, and is often carried out using 
power-hungry hardware such as large clusters (Le et al, 2011) 
or graphics processing units (Bergstra et al., 2010). Their imple- 
mentation in dedicated hardware platforms can therefore be very 
appealing from the perspectives of power dissipation and of 
scalability. 

Neuromorphic Very Large Scale Integration (VLSI) systems 
exploit the physics of the device to emulate very densely the 
performance of biological neurons in a real-time fashion, while 
dissipating very low power (Mead, 1989; Indiveri et al., 2011). 
The distributed structure of RBMs suggests that neuromorphic 
VLSI circuits and systems can become ideal candidates for such a 
platform. Furthermore, the communication between neuromor- 
phic components is often mediated using asynchronous address- 
events (Deiss et al, 1998) enabling them to be interfaced with 
event-based sensors (Liu and Delbruck, 2010; Neftci et al., 2013; 
O'Connor et al, 2013) for embedded applications, and to be 
implemented in a very scalable fashion (Silver et al., 2007; Joshi 
et al, 2010; Schemmel et al, 2010). 



Currently, RBMs and the algorithms used to train them are 
designed to operate efficiently on digital processors, using batch, 
discrete-time, iterative updates based on exact arithmetic calcula- 
tions. However, unlike digital processors, neuromorphic systems 
compute through the continuous-time dynamics of their compo- 
nents, which are typically Integrate & Fire (l&F) neurons (Indiveri 
et al., 2011), rendering the transfer of such algorithms on such 
platforms a non-trivial task. We propose here a method to con- 
struct RBMs using l&F neuron models and to train them using 
an online, event-driven adaptation of the Contrastive Divergence 
(CD) algorithm. 

We take inspiration from computational neuroscience to 
identify an efficient neural mechanism for sampling from the 
underlying probability distribution of the RBM. Neuroscientists 
argue that brains deal with uncertainty in their environments 
by encoding and combining probabilities optimally (Doya et al., 
2006), and that such computations are at the core of cogni- 
tive function (Griffiths et al., 2010). While many mechanistic 
theories of how the brain might achieve this exist, a recent neu- 
ral sampling theory postulates that the spiking activity of the 
neurons encodes samples of an underlying probability distribu- 
tion (Fiser et al, 2010). The advantage for a neural substrate 
in using such a strategy over the alternative one, in which 
neurons encode probabilities, is that it requires exponentially 
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fewer neurons. Furthermore, abstract model neurons consis- 
tent with the behavior of biological neurons can implement 
Markov Chain Monte Carlo (MCMC) sampling (Buesing et al., 
2011), and RBMs sampled in this way can be efficiently trained 
using CD, with almost no loss in performance (Pedroni et al., 
2013). We identify the conditions under which a dynamical 
system consisting of I&F neurons performs neural sampling. 
These conditions are compatible with neuromorphic imple- 
mentations of I&F neurons (Indiveri et al, 2011), suggest- 
ing that they can achieve similar performance. The calibra- 
tion procedure necessary for configuring the parameters of the 
spiking neural network is based on firing rate measurements, 
and so is easy to realize in software and in hardware plat- 
forms. 

In standard CD, weight updates are computed on the basis 
of alternating, feed-forward propagation of activities (Hinton, 
2002). In a neuromorphic implementation, this translates to 
reprogramming the network connections and resetting its 
state variables at every step of the training. As a consequence, 
it requires two distinct dynamical systems: one for normal 
operation (i.e., testing), the other for training, which is highly 
impractical. To overcome this problem, we train the neural RBMs 
using an online adaptation of CD. We exploit the recurrent 
structure of the network to mimic the discrete "construction" 
and "reconstruction" steps of CD in a spike-driven fashion, and 
Spike Time Dependent Plasticity (STDP) to carry out the weight 
updates. Each sample (spike) of each random variable (neuron) 
causes synaptic weights to be updated. We show that, over longer 
periods, these microscopic updates behave like a macroscopic 
CD weight update. Compared to standard CD, no additional 
connectivity programming overhead is required during the 
training steps, and both testing and training take place in the 
same dynamical system. 

Because RBMs are generative models, they can act simulta- 
neously as classifiers, content-addressable memories, and carry 
out probabilistic inference. We demonstrate these features in a 
MNIST hand-written digit task (LeCun et al., 1998), using an 
RBM network consisting of one layer of 824 "visible" neurons 
and one layer of 500 "hidden" neurons. The spiking neural net- 
work was able to learn a generative model capable of recognition 
performances with accuracies up to 91.9%, which is close to the 
performance obtained using standard CD and Gibbs sampling, 
93.6%. 

2. MATERIALS AND METHODS 

2.1. NEURAL SAMPLING WITH NOISY I&F NEURONS 

We describe here conditions under which a dynamical system 
composed of I&F neurons can perform neural sampling. It has 
been proven that abstract neuron models consistent with the 
behavior of biological spiking neurons can perform MCMC sam- 
pling of a Boltzmann distribution (Buesing et al, 2011). Two 
conditions are sufficient for this. First, the instantaneous firing 
rate of the neuron verifies: 

{0 if t - t' < x r 

(1) 
r(«(f)) t-t' >x r 



with r(u(f)) proportional to exp(w(f)), where u(t) is the 
membrane potential and x r is an absolute refractory period dur- 
ing which the neuron cannot fire. p(u(t), t— t') describes the 
neuron's instantaneous firing rate as a function of u{i) at time 
f, given that the last spike occurred at t' . It can be shown that the 
average firing rate of this neuron model for stationary u(t) is the 
sigmoid function: 

p(u) = (x r + exp(-u))- 1 . (2) 

Second, the membrane potential of neuron i is equal to the linear 
sum of its inputs: 

N 

Ui(t) = b, + J2 WijZjit), Vi = 1 N, (3) 

J=l 

where fo; is a constant bias, and Z;(t) represents the pre-synaptic 
spike train produced by neuron j defined as being equal to 1 when 
the pre-synaptic neuron spikes for a duration x r , and equal to zero 
otherwise. The terms w«z;(f) are identified with the time course 
of the Post-Synaptic Potential (PSP), i.e., the response of the 
membrane potential to a pre-synaptic spike. The two conditions 
above define a neuron model, to which we refer as the "abstract 
neuron model." Assuming the network states are binary vectors 
[zi, . . . , Zfc], it can be shown that, after an initial transient, the 
sequence of network states can be interpreted as MCMC samples 
of the Boltzmann distribution: 

p(zi, z k ) =- exp( - E(zi, z^)), with 

1 (4) 
E(zi , . . . , z*) = - - ^2 W l} ZiZ } - ^2 hZi, 

v i 

where Z = exp( — E(z\ , . . . , Z]S) is a constant such that 

p sums up to unity, and E(z\ , . . . , zjt) can be interpreted as an 
energy function (Haykin, 1998). 

An important fact of the abstract neuron model is that, accord- 
ing to the dynamics of Zj(t), the PSPs are "rectangular" and 
non-additive since no two presynaptic spikes can occur faster than 
the refractive period. The implementation of synapses producing 
such PSPs on a large scale is very difficult to realize in hardware, 
when compared to first-order linear filters that result in "alpha"- 
shaped PSPs (Destexhe et al, 1998; Bartolozzi and Indiveri, 2007). 
This is because, in the latter model, the synaptic dynamics are lin- 
ear, such that a single hardware synapse can be used to generate 
the same current that would be generated by an arbitrary num- 
ber of synapses (see also next section). As a consequence, we will 
use alpha-shaped PSPs instead of rectangular PSPs in our mod- 
els. The use of the alpha PSP over the rectangular PSP is the major 
source of degradation in sampling performance, as we will discuss 
in section 2.2. 

2.1.1. Stochastic I&F neurons 

A neuron whose instantaneous firing rate is consistent with 
Equation (1) can perform neural sampling. Equation (1) is a gen- 
eralization of the Poisson process to the case when the firing prob- 
ability depends on the time of the last spike (i.e., it is a renewal 
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process), and so can be verified only if the neuron fires stochasti- 
cally (Cox, 1962). Stochasticity in I&F neurons can be obtained 
through several mechanisms, such as a noisy reset potential, 
noisy firing threshold, or noise injection (Plesser and Gerstner, 
2000). The first two mechanisms necessitate stochasticity in the 
neuron's parameters, and therefore may require specialized cir- 
cuitry. But noise injection in the form of background Poisson 
spike trains requires only synapse circuits, which are present in 
many neuromorphic VLSI implementation of spiking neurons 
(Bartolozzi and Indiveri, 2007; Indiveri et al, 201 1). Furthermore, 
Poisson spike trains can be generated self-consistently in balanced 
excitatory-inhibitory networks (van Vreeswijk and Sompolinsky, 
1996), or using finite-size effects and neural mismatch (Amit and 
Brunei, 1997). 

We show that the abstract neuron model in Equation (1) can 
be realized in a simple dynamical system consisting of leaky I&F 
neurons with noisy currents. The neuron's membrane potential 
below firing threshold 6 is governed by the following differential 
equation: 



C-rUi = ~gm + 1,(0 + a%(t), u,(t) e (-00, 6), (5) 
at 



where C is a membrane capacitance, w,- is the membrane potential 
of neuron i, gi is a leak conductance, cr^ (f ) is a white noise term of 
amplitude a (which can for example be generated by background 
activity), J,(f) its synaptic current and 6 is the neuron's firing 
threshold. When the membrane potential reaches 6, an action 
potential is elicited. After a spike is generated, the membrane 
potential is clamped to the reset potential u rst for a refractory 
period x r . 

In the case of the neural RBM, the currents 7,(t) depend on the 
layer the neuron is situated in. For a neuron i in layer v 



For a neuron; in layer h, 



ii(t) = if(t) + iy(t), 



N h 



Tsyn dt 7,U = ~ /,U + J2l h iM f > + 1b,b Vi (t), 



(6) 



where I d (t) is a current representing the data (i.e., the external 
input), I" is the feedback from the hidden layer activity and the 
bias, and the <j's are the respective synaptic weights, and b v (f) is 
a Poisson spike train implementing the bias. Spike trains are rep- 
resented by a sum of Dirac delta pulses centered on the respective 
spike times: 



MO = I] 5 ( f - f k)> hj(f) = I] S{t-tk) 



(7) 



<:€% 



ice Spj 



where Sp t and Spj are the set of the spike times of the bias neuron 
b Vi and the hidden neuron hj, respectively, and 8(f) = 1 if t = 0 
and 0 otherwise. 



t -I h 
syn dti 



-if + Y^qv ij v l (t) + q bj b hj (t), 



(8) 



i= l 



where I h is the feedback from the visible layer, and v(t) and £>/,(f) 
are Poisson spike trains of the visible neurons and the bias neu- 
rons, defined similarly as in Equation (7). The dynamics of I and 
I v correspond to a first-order linear filter, so each incoming spike 
results in PSPs that rise and decay exponentially (i.e., alpha-PSP) 
(Gerstner and Kistler, 2002). 

Can this neuron verify the conditions required for neural sam- 
pling? The membrane potential is already assumed to be equal to 
the sum of the PSPs as required by neural sampling. So to answer 
the above question we only need to verify whether Equation ( 1 ) 
holds. Equation (5) is a Langevin equation which can be analyzed 
using the Fokker-Planck equation (Gardiner, 2012). The solution 
to this equation provides the neuron's input/output response, i.e., 
its transfer curve (for a review, see Renart et al., 2003): 



P("o) 



"0 



x r + x m ^/ji V dxexp(x z )(l + erf(x)) I , (9) 

J "rst-»Q 

<y v 



-1 



where erf is the error function (the integral of the normal dis- 
tribution), tin = Jj- is the stationary value of the membrane 

potential when injected with a constant current I, x m = ^ is the 

membrane time constant, « rst is the reset voltage, and Oy(u) = 
o 2 /(g L Q. 

According to Equation (2), the condition for neural sampling 
requires that the average firing rate of the neuron to be the sig- 
moid function. Although the transfer curve of the noisy I&F 
neuron Equation (9) is not identical to the sigmoid function, it 
was previously shown that with an appropriate choice of param- 
eters, the shape of this curve can be very similar to it (Merolla 
et al., 2010). We observe that, for a given refractory period x r , 
the smaller the ratio ~ in Equation (5), the better the trans- 
fer curve resembles a sigmoid function (Figure 1). With a small 



ay 



the transfer function of a neuron can be fitted to 



v(Z) 



1 ( y | exp(-Jp) V 



(10) 



where P and y are the parameters to be fitted. The choice of 
the neuron model described in Equation (5) is not critical for 
neural sampling: A relationship that is qualitatively similar to 
Equation (9) holds for neurons with a rigid (reflective) lower 
boundary (Fusi and Mattia, 1999) which is common in VLSI 
neurons, and for I&F neurons with conductance-based synapses 
(Petrovici et al, 2013). 

This result also shows that synaptic weights q Vj , q^. , which have 
the units of charge are related to the RBM weights Wy by a factor 
To relate the neural activity to the Boltzmann distribution, 
Equation (4), each neuron is associated to a binary random vari- 
able which is assumed to take the value 1 for a duration x r after the 
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-120 0 



FIGURE 1 | Transfer curve of a leaky l&F neuron for three different 
parameter sets where uo = -L, and -1 = 250 [Hz] (dashed gray). In this 
plot, a v is varied to produce different ratios 6 " o ^"' . The three plots above 
shows that the fit with the sigmoid function (solid black) improves as the 
ratio decreases. 



neuron has spiked, and zero otherwise, similarly to Buesing et al. 
(2011). With this encoding, the network state is characterized by 
a binary vector having the same number of entries as the number 
of neurons in the network. The relationship between this ran- 
dom vector and the I&F neurons' spiking activity is illustrated in 
Figure 3. The membrane potential of the neuron (black) evolves 
in a random fashion until it spikes, after which it is clamped to 
u rst for a duration x r (gray). While the neuron is in the refrac- 
tory period, the random variable associated to it is assumed to 
takes the value 1. This way, the state of the network can always 
be associated with a binary vector. According to the theory, the 
dynamics in the network guarantees that the binary vectors are 
samples drawn from a Boltzmann distribution. 

2. 1.2. Calibration protocol 

In order to transfer the parameters from the probability distribu- 
tion Equation (4) to those of the I&F neurons, the parameters 
y, p in Equation (10) need to be fitted. An estimate of a neu- 
ron's transfer function can be obtained by computing its spike 
rate when injected with different values of constant inputs I. The 
refractory period x r is the inverse of the maximum firing rate 
of the neuron, so it can be easily measured by measuring the 
spike rate for very high input current I. Once x r is known, the 
parameter estimation can be cast into a simple linear regression 
problem by fitting log(p(!) _1 — x r ) with p/ + log(y). Figure 2 
shows the transfer curve when x r = 0 ms, which is approximately 
exponential in agreement with Equation (1). 

The shape of the transfer curse is strongly dependent on the 
noise amplitude. In the absence of noise, the transfer curve is a 
sharp threshold function, which softens as the amplitude of the 
noise is increased (Figure 1). As a result, both parameters y and 
P are dependent on the variance of the input currents from other 
neurons 7(t). Since fiq = w, the effect of the fluctuations on the 
network is similar to scaling the synaptic weights and the biases 




-6.0 -4.0 -2.0 0.0 




4y[ nA ] 

FIGURE 2 | Transfer function of I&F neurons driven by background 
white noise Equation (5). We measure the firing rate of the neuron as a 
function of a constant current injection to estimate p(uo), where for 
constant /m, uq = knj/gt- (Top) The transfer function of noisy I&F neurons 
in the absence of refractory period [p(u) = r{u), circles]. We observe that p 
is approximately exponential over a wide range of inputs, and therefore 
compatible with neural sampling. Crosses show the transfer curve of 
neurons implementing the abstract neuron Equation (1), exactly. (Bottom) 
With an absolute refractory period the transfer function approximates the 
sigmoid function. The firing rate saturates at [250]Hz due to the refractory 
period chosen for the neuron. 



which can be problematic. However, by selecting a large enough 
noise amplitude 0 and a slow enough input synapse time con- 
stant, the fluctuations due to the background input are much 
larger than the fluctuations due to the inputs. In this case, P and 
y remain approximately constant during the sampling. 

Neural mismatch can cause P and y to differ from neuron to 
neuron. From Equation (10) and the linearity of the postsynaptic 
currents /(f) in the weights, it is clear that this type of mismatch 
can be compensated by scaling the synaptic weights and biases 
accordingly. The calibration of the parameters y and p quan- 
titatively relate the spiking neural network's parameters to the 
RBM. In practice, this calibration step is only necessary for map- 
ping pre-trained parameters of the RBM onto the spiking neural 
network. 

Although we estimated the parameters of software simulated 
I&F neurons, parameter estimation based on firing rate measure- 
ments were shown to be an accurate and reliable method for VLSI 
I&F neurons as well (Neftci et al, 2012). 

2.2. VALIDATION OF NEURAL SAMPLING USING I&F NEURONS 

The I&F neuron verifies Equation ( 1 ) only approximately, and the 
PSP model is different from the one of Equation (3). Therefore, 
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the following two important questions naturally arise: how accu- 
rately does the I&F neuron-based sampler outlined above sample 
from a target Boltzmann distribution? How well does it perform 
in comparison to an exact sampler, such as the Gibbs sampler? To 
answer these questions we sample from several neural RBM con- 
sisting of five visible and five hidden units for randomly drawn 
weight and bias parameters. At these small dimensions, the proba- 
bilities associated to all possible values of the random vector z can 
be computed exactly. These probabilities are then compared to 




20ms 



FIGURE 3 | Neural Sampling in an RBM consisting of 10 stochastic l&F 
neurons, with five neurons in each layer. Each neuron is associated to a 
binary random variable which take values 1 during a refractory period x r 
after the neuron has spiked (gray shadings). The variables are sampled at 
1 kHz to produce binary vectors that correspond to samples of the joint 
distribution p(z). In this figure, only the membrane potential and the 
samples produced by the first five neurons are shown. The vectors inside 
the brackets are example samples of the marginalized distribution 
p(Z! , Z2, Z3, Z4, Z5) produced at the time indicated by the vertical lines. In 
the RBM, there are no recurrent connections within a layer. 



those obtained through the histogram constructed with the sam- 
pled events. To construct this histogram, each spike was extended 
to form a box of length x r (as illustrated in Figure 3), the spiking 
activity was sampled at 1 kHz, and the occurrences of all the pos- 
sible 2 10 states of the random vector z were counted. We added 1 
to the number of occurrences of each state to avoid zero probabil- 
ities. The histogram obtained from a representative run is shown 
in Figure 4 (left). 

A common measure of similarity between two distributions p 
and q is the KL divergence: 



D(p\\q) = J2p> l °g-- 
1i 



If the distributions p and q are identical then D(p\\q) = 0, other- 
wise D(p\ \q) > 0. The right panel of Figure 4 shows D(p\ |P e xact) 
as a function of sampling duration, for distributions p obtained 
from three different samplers: the abstract neuron based sam- 
pler with alpha PSPs (-PNS,Abstract)i the I&F neuron-based sampler 
(Pns); and the Gibbs sampler (Pcibbs)- 

In the case of the I&F neuron-based sampler, the average KL 
divergence for 48 randomly drawn distributions after 1000 s of 
sampling time was 0.059 ± 0.049. This result is not significantly 
different if the abstract neuron model Equation (1) with alpha 
PSPs is used (average KL divergence 0.10 ± 0.049), and in both 
cases the KL divergence did not tend to zero as the number 
of samples increased. The only difference in the latter neuron 
model compared to the abstract neuron model of Buesing et al. 
(2011), which tends to zero when sampling time tends to infin- 
ity, is the PSP model. This indicates that the discrepancy is largely 
due to the use of alpha-PSPs, rather than the approximation of 
Equation (1) with I&F neurons. 




^ ^ ^ ^ ^ ^ 
^ ^ ^ ^ ^ ^ ^ 



10 2 10 3 10 4 10 5 io e 

Time [ms] 



[ z l , z 2i z 3-> z 4i z 5. 



FIGURE 4 I (Left) Example probability distribution obtained by neural 
sampling of the RBM of Figure 3. The bars are marginal probabilities 
computed by counting the events [00000], [00001], . . . , [11110], [11111], 
respectively. P^s is the distribution obtained by neural sampling and P is the 
exact probability distribution computed with Equation (4). (Right) The degree 
to which the sampled distribution resembles the target distribution is 
quantified by the KL divergence measured across 48 different distributions, 
and the shadings correspond to its standard deviation. This plot also shows 
the KL divergence of the target distribution sampled by Gibbs Sampling 



(Pcibbs), which is the common choice for RBMs. For comparison with the 
neural sampler, we identified the duration of one Gibbs sampling iteration 
with one refractory period x r = 4ms. The plot shows that up to 10 4 ms, the 
two methods are comparable. After this, the KL divergence of the neural 
sampler tends to a plateau due to the fact that neural sampling with our I&F 
neural network is approximate. In both figures, Pns, Abstract refers to the 
marginal probability distribution obtained by using the abstract neuron model 
Equation (1 ). In this case, the KL divergence is not significantly different from 
the one obtained with the I&F neuron model-based sampler. 
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The standard sampling procedure used in RBMs is Gibbs 
Sampling: the neurons in the visible layer are sampled simultane- 
ously given the activities of the hidden neurons, then the hidden 
neurons are sampled given the activities of the visible neurons. 
This procedure is iterated a number of times. For comparison 
with the neural sampler, the duration of one Gibbs sampling iter- 
ation is identified with one refractory period x T = 4 ms. At this 
scale, we observe that the speed of convergence of the neural sam- 
pler is similar to that of the Gibbs sampler up to 10 4 ms, after 
which the neural sampler plateaus above the D(p\ \q) = 1CT 2 line. 
Despite the approximations in the neuron model and the synapse 
model, these results show that in RBMs of this size, the neural 
sampler consisting of I&F neurons sample from a distribution 
that has the same KL divergence as the distribution obtained after 
10 4 iterations of Gibbs sampling, which is more than the typical 
number of iterations used for MNIST hand-written digit tasks in 
the literature (Hinton et al, 2006). 

2.3. NEURAL ARCHITECTURE FOR LEARNING A MODEL OF MNIST 
HAND-WRITTEN DIGITS 

We test the performance of the neural RBM in a digit recognition 
task. We use the MNIST database, whose data samples consist 
of centered, gray-scale, 28 x 28-pixel images of hand-written 
digits 0-9 (LeCun et al, 1998). The neural RBM's network 
architecture consisted of two layers, as illustrated in Figure 5. 
The visible layer was partitioned into 784 sensory neurons (vd) 
and 40 class label neurons (v c ) for supervised learning. The 
pixel values of the digits were discretized to two values, with 
low intensity pixel values (p < 0.5) mapped to 10~ 5 and high 
intensity values (p > 0.5) mapped to 0.98. A neuron i in d 
stimulated each neuron i in layer v, with synaptic currents /, such 
that P(vj = 1) = v(f,)x r = pi, where 0 < p, < 1 is the value of 
pixel i. The value f, is calculated by inverting the transfer function 

of the neuron: fi = (s) = log ^ ^^ J p^ 1 . Using this RBM, 
classification is performed by choosing the most likely label given 
the input, under the learned model. This equals to choosing the 



population of class neurons associated to the same label that has 
the highest population firing rate. 

To reconstruct a digit from a class label, the class neurons 
belonging to a given digit are clamped to a high firing rate. For 
testing the discrimination performance of an energy-based model 
such as the RBM, it is common to compute the free-energy F(v c ) 
of the class units (Haykin, 1998), defined as: 

exp(-f(v c )) = ^]exp(-£(v d , v c , h)), (11) 

v<j,/l 



Table 1 | List of parameters used in the software simulations. 3 



Vbias 


Mean firing rate of 
bias Poisson spike 
train 


All figures 


1000 Hz 


0" 


Noise amplitude 


All figures, except Figure 1 
Figure 1 (left) 
Figure 1 (right) 
Figure 1 (bottom) 


3- 10" 11 A/s 05 
2- 10" 11 A/s 05 
3 ■ 10" 10 A/s 0 5 
1 ■ 10- 9 A/s 0 - 5 


p 


Fvnnnontial f artnr 
LA)JUI Id ILIdl I aL. LUI 


All f ini i roc 
rti i i ly u i Co 


9 ClAA ■ 1D 9 A -1 




(fit) 






y 


Racplinp firinn r3tp 

UuOGlll IC I I I 1 1 I I uLC 

(fit) 


All fini itpc 

rAI I I I M U I Co 


RR0R hb 

UUUU 1 IZ. 


tr 


Refractory period 


All figures 


4 ms 


T-syn 


Timp pnnctant of 

1 II 1 1 O U \J 1 1 o LCI 1 IL \J\ 

recurrent, and bias 
synapses 


All fini itpc 

AAI I I I M I Co 


4 ms 


Tbr 


"Burn-in" time of the 
neural sampling 


All figures 


10ms 


9l 


Leak conductance 


All figures 


1 nS 




Reset potential 


All figures 


OV 


c 


Membrane 
capacitance 


All figures 


10" 12 F 


9 


Firing threshold 


All figures 


100 mV 


w 


RBM weight matrix 

(e R N " xN ») 


Figure 4 


W(-0.75, 1.5) 


bv, b h 


RBM bias for layer u 
and h 


Figure 4 


W(-1.5,0.5) 


N v ,N h 


Number of visible 
and hidden units 


Figure 4 


5.5 




in the RBM 


Figures 7, 8, 7 


824. 500 


N c 


Number of class 
label units 


Figures 7, 8, 7 


40 


2T 


Epoch duration 


Figures 4, 7, 8 
Figure 9 


100 ms 
300 ms 




Simulation time 


Figure 2 
Figure 4 
Figure 7 
Figure 9 

Figure 8 (testing) 
Figure 8 (learning) 


5s 

1000 s 
0.2 s 
0.85 s 
1.0 s 
2000 s 


TSTDP 


Learning time 
window 


Figure 7 


4 ms 




Learning rate 


Standard CD 
Event-driven CD 


0.1 ■ 10" 2 
3.2- 10" 2 



a Software simulation scripts are available online (https://github.com/ 
eneftci/eCD). 



Hidden 
(500 neurons 



Visible „. I 
(784 neurons) d I 




W r . 




| qi Class 
I V C (40 neurons) 



^ Data Layer 
784+40 neurons 



FIGURE 5 | The RBM network consists of a visible and a hidden layer. 

The visible layer is partitioned into 784 sensory neurons (v^) and 40 class 
label neurons (v c ) for supervised learning. During data presentation, the 
activities in the visible layer are driven by a data layer d, consisting of a digit 
and its label (1 neuron per label). In the RBM, the weight matrix between 
the visible layer and the hidden layer is symmetric. 
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and selecting v c such that the free-energy is minimized. The 
spiking neural network is simulated using the BRIAN simula- 
tor (Goodman and Brette, 2008). All the parameters used in the 
simulations are provided in Table 1. 

3. RESULTS 

3.1. EVENT-DRIVEN CONTRASTIVE DIVERGENCE 

A Restricted Boltzmann Machine (RBM) is a stochastic neural 
network consisting of two symmetrically interconnected layers 
composed of neuron-like units — a set of visible units v and a set 
of hidden units h, but has no connections within a layer. 

The training of RBMs commonly proceeds in two phases. At 
first the states of the visible units are clamped to a given vec- 
tor from the training set, then the states of the hidden units 
are sampled. In a second "reconstruction" phase, the network is 
allowed to run freely. Using the statistics collected during sam- 
pling, the weights are updated in a way that they maximize 
the likelihood of the data (Hinton, 2002). Collecting equilib- 
rium statistics over the data distribution in the reconstruction 
phase is often computationally prohibitive. The CD algorithm 
has been proposed to mitigate this (Hinton, 2002; Hinton and 
Salakhutdinov, 2006): the reconstruction of the visible units' 
activity is achieved by sampling them conditioned on the values 
of the hidden units (Figure 6). This procedure can be repeated 
k times (the rule is then called CDj-), but relatively good con- 
vergence is obtained for the equilibrium distribution even for 
one iteration. The CD learning rule is summarized as follows: 



Awjj = e((L>ify)data 



(vihj) 



0, 



(12) 



where v\ and hj are the activities in the visible and hidden lay- 
ers, respectively. This rule can be interpreted as a difference 
of Hebbian and anti-Hebbian learning rules between the visi- 
ble and hidden neurons sampled in the data and reconstruc- 
tion phases. In practice, when the data set is very large, weight 
updates are calculated using a subset of data samples, or "mini- 
batches." The above rule can then be interpreted as a stochas- 
tic gradient descent (Robbins and Monro, 1951). Although 
the convergence properties of the CD rule are the subject of 
continuing investigation, extensive software simulations show 
that the rule often converges to very good solutions (Hinton, 
2002). 



The main result of this paper is an online variation of the CD 
rule for implementation in neuromorphic hardware. By virtue 
of neural sampling the spikes generated from the visible and 
hidden units can be used to compute the statistics of the prob- 
ability distributions online (further details on neural sampling 
in the Materials and Methods section 2.1). Therefore a possi- 
ble neural mechanism for implementing CD is to use synapses 
whose weights are governed by synaptic plasticity. Because the 
spikes cause the weight to update in an online, and asynchronous 
fashion, we refer to this rule as event-driven CD. 

The weight update in event-driven CD is a modulated, pair- 
based STDP rule: 



At 



q l} =g(t) STDPgOiCf), hj(t)) 



(13) 



where g(t) eR is a zero-mean global gating signal controlling 
the data vs. reconstruction phase, q,j is the weight of the synapse 
and u,(f) and hj(t) refer to the spike trains of neurons u,- and hj, 
defined as in Equation (7). 

As opposed to the standard CD rule, weights are updated after 
every occurrence of a pre-synaptic and post-synaptic event. While 
this online approach slightly differentiates it from standard CD, it 
is integral to a spiking neuromorphic framework where the data 
samples and weight updates cannot be stored. The weight update 
is governed by a symmetric STDP rule with a symmetric temporal 
window K(t) = K(-t), Vf: 

STDPy(u,(f), hj(t)) = v,(t)A hj (t) + hj(t)A Vi (t), 

A hj (t)=Af dsK(t-s)hj(s), (14) 

J — OO 

A u .(t) = A ( dsK(s- t)v,(s), 

with A > 0 defining the magnitude of the weight updates. In 
our implementation, updates are additive and weights can change 
polarity. 

3. 1. 1. Pairwise STDP with a global modulatory signal 
approximates CD 

The modulatory signal g(t) switches the behavior of the synapse 
from LTP to LTD (i.e., Hebbian to Anti-Hebbian). The temporal 
average of g(t) must vanish to balance LTP and LTD, and must 
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FIGURE 6 | The standard Contrastive Divergence [CD) k procedure, 
compared to event-driven CD. (A) In standard CD, learning proceeds iteratively 
by sampling in "construction" and "reconstruction" phases (Hinton, 2002), 
which is impractical in a continuous-time dynamical system. (B) We propose a 
spiking neural sampling architecture that folds these updates on a continuous 
time dimension through the recurrent activity of the network. The synaptic 



weight update follows a STDP rule modulated by a zero mean signal g(t). This 
signal switches the behavior of the synapse from Long-Term Potentiation (LTP) 
to Long-Term Depression (LTD), and partitions the training into two phases 
analogous to those of the original CD rule. The spikes cause microscopic 
weight modifications, which on average behave as the macroscopic CD weight 
update. For this reason, the learning rule is referred to as event-driven CD. 
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vary on much slower time scales than the typical times scale of 
the network dynamics, denoted Tb r , so that the network samples 
from its stationary distribution when the weights are updated. 
The time constant tb r corresponds to a "burn-in" time of MCMC 
sampling and depends on the overall network dynamics and can- 
not be computed in the general case. However, it is reasonable 
to assume tbr to be in the order of a few refractory periods of the 
neurons (Buesing et al., 201 1). In this work, we used the following 
modulation function g(t): 



g(t) 



1 ifmod(r, 27) e (t br , 7) 

-1 ifmod(f, 27) e (r + t br ,2r) 

0 otherwise 



(15) 



where mod is the modulo function and 7 is a time interval. 
The data is presented during the time intervals (2i7, (2i + 1)T), 
where i is a positive integer. With the g(t) defined above, no 
weight update is undertaken during a fixed period tbr- This allows 
us to neglect the transients after the stimulus is turned on and 
off (respectively in the beginning of the data and reconstruction 
phases). In this case and under further assumptions discussed 
below, the event-driven CD rule can be directly compared with 
standard CD as we now demonstrate. The average weight update 
during (0, 2T) is: 



\dt 



lij) — C t j + Ry, 



1 (0,2T) 



T- T br 



" " 27 
27 



{(vi(t)A hj (t)) td + (h,(t)A Vt {t)) ti ) 



({vi[t)A hj (t)) tr + {hj(t)A Vi (t)) tr ), 



(16) 



where td = (tb r , 7) and t r = (7 + tbr, 27) denote the intervals 
during the positive and negative phases of g(t), and (•)(«,&) = 



1 rb 



dt: 



b — a J a 

We write the first average in Qj as follows: 
(v,(t)A hj (t)} td =A-^ f dt f dsK(t-s)v,(t)hj( S ), 

1 ^br J -tb r J -co 

i r T r 00 

= A / At AAK(A)Vi(t)hj(t - A), 

7 — tbr J-tbr JO 

p CO 

= A AAK(A){vi(t)hj(t - A)) td . 
Jo 



(17) 



If the spike times are uncorrelated the temporal averages become 
a product of the average firing rates of a pair of visible and hidden 
neurons (Gerstner and Kistler, 2002): 

{Vi{t)hj{t- A)) ti = {Vi{t)) u {hj{t-A)) ti =: v+h+. 

If we choose a temporal window that is much smaller than 7, and 
assume the network activity is stationary in the interval (tbr, 7), 



we can write (up to a negligible error Kempter et al., 2001) 

p oo 

(v,(t)A hj (t)), d =Av+h+J^ AAK(A). (18) 

In the uncorrelated case, the second term in C« contributes the 
same amount, leading to: 

Qj = r\v+h+. 

with T) := 2A T ^j** J™ AAK(A). Similar arguments apply to the 
averages in the time interval t r : 

poo 

Rij = 2A / AAK(A){vi(t)hj(t - A)) fr = r\vrh~ 
Jo 1 



with 



u- h- := (vj(t)) tr (hj(t — A)) tr . The average update in 



(0, 27) then becomes: 



dl 



(0,2T) 



1 {vfh+ - v i h. ) 



(19) 



According to Equation (18), any symmetric temporal win- 
dow that is much shorter than 7 can be used. For sim- 
plicity, we choose an exponential temporal window K(A) = 
exp(— | A/tstdpI) with decay rate tstdp ^ ^ (Figure 6B). In this 
case, r) = 2A^=^x STDP . 

The modulatory function g(t) partitions the training into 
epochs of duration 27. Each epoch consists of a LTP phase during 
which the data is presented (construction), followed by a free- 
running LTD phase (reconstruction). The weights are updated 
asynchronously during the time interval in which the neural 
sampling proceeds, and Equation (19) tells us that its average 
resembles Equation (12). However, it is different in two ways: 
the averages are taken over one data and reconstruction phase 
rather than a mini-batch of data samples and their reconstruc- 
tions; and more importantly, the synaptic weights are updated 
during the data and the reconstruction phase, whereas in the CD 
rule, updates are carried out at the end of the reconstruction 
phase. In the derivation above the effect of the weight change on 
the network during an epoch 27 was neglected for mathematical 
simplicity. In the following, we verify that despite this approxima- 
tion, the event-driven CD performs nearly as well as standard CD 
in the context of a common benchmark task. 

3.2. LEARNING A GENERATIVE MODEL OF HAND-WRITTEN DIGITS 

We train the RBM to learn a generative model of the MNIST 
handwritten digits using event-driven CD (see section 2.3 for 
details). For training, 20,000 digits selected randomly (with 
repetition) from a training set consisting of 10,000 digits were 
presented in sequence, with an equal number of samples for each 
digit. 

The raster plots in Figure 7 show the spiking activity of each 
layer before and after learning for epochs of duration 100 ms. The 
top panel shows the population-averaged weight. After training, 
the sum of the upwards and downward excursions of the average 
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FIGURE 7 | The spiking neural network learns a generative model of the 
MNIST dataset using the event-driven CD procedure. (A) Learning curve, 
shown here up to 10, 000 samples. (B) Details of the training procedure, 
before and after training (20,000 samples). During the first half of each 0.1 s 
epoch, the visible layer v is driven by the sensory layer, and the gating 
variable g is 1 , meaning that the synapses undergo LTP. During the second 
half of each epoch, the sensory stimulus is removed, and g is set to —1, so 
the synapses undergo LTD. The top panels of both figures show the mean of 
the entries of the weight matrix. The second panel shows the values of the 
modulatory signal g(t). The third panel shows the synaptic currents of a 
visible neuron, where //, is caused by the feedback from the hidden and the 



bias, and 4/ is the data. The timing of the clamping (I J) and g differ due to an 
interval xi, r where no weight update is undertaken to avoid the transients 
(see section 2). Before learning and during the reconstruction phase, the 
activity of the visible layer is random. But as learning progresses, the activity 
in the visible layer reflects the presented data in the reconstruction phase. 
This is very well visible in the layer class label neurons v c , whose activity 
persists after the sensory stimulus is removed. Although the firing rates of 
the hidden layer neurons before training is high (average 113 Hz), this is only a 
reflection of the initial conditions for the recurrent couplings W. In fact, at the 
end of the training, the firing rates in both layers becomes much sparser 
(average 9.31 Hz). 



weight is much smaller than before training, because the learn- 
ing is near convergence. The second panel shows the value of the 
modulatory signal g(t). The third panel shows the input current 
(Id) and the current caused by the recurrent couplings (J/,). 

Two methods can be used to estimate the overall recognition 
accuracy of the neural RBM. The first is to sample: the visible 
layer is clamped to the digit only (i.e., ty), and the network is 
run for Is. The known label is then compared with the posi- 
tion of the group of class neurons that fired at the highest rate. 
The second method is to minimize free-energy: the neural RBMs 
parameters are extracted, and for each data sample, the class label 
with the lowest free-energy (see section 2) is compared with the 
known label. In both cases, recognition was tested for 1000 data 
samples that were not used during the training. The results are 
summarized in Figure 8. 

As a reference we provide the best performance achieved using 
the standard CD and one unit per class label (N c = 10) (Figure 8, 
table row 1), 93.6%. By mapping the these parameters to the 
neural sampler, the recognition accuracy reached 92.6%. The dis- 
crepancy is expected since the neural sampler does not exactly 
sample from the target Boltzmann distribution (see section 2.2). 

When training a neural RBM of I&F neurons using event- 
driven CD, the recognition result was 91.9% (Figure 8, table 
row 2). The performance of this RBM obtained by minimizing its 
free-energy was 90.8%. The learned parameters performed well 
for classification using the free-energy calculation which suggests 



that the network learned a model that is consistent with the 
mathematical description of the RBM. 

In an energy-based model like the RBM the free-energy min- 
imization should give the upper bound on the discrimination 
performance (Haykin, 1998). For this reason, the fact that the 
recognition accuracy is higher when sampling as opposed to using 
the free-energy method may appear puzzling. However, this is 
possible because the neural RBM does not exactly sample from 
the Boltzmann distribution, as explained in section 2.2. This 
suggests that event-driven CD compensates for the discrepancy 
between the distribution sampled by the neural RBM and the 
Boltzmann distribution, by learning a model that is tailored to 
the spiking neural network. 

Excessively long training durations can be impractical for 
real-time neuromorphic systems. Fortunately, the learning using 
event-driven CD is fast: Compared to the off-line RBM train- 
ing (250, 000 presentations, in mini-batches of 100 samples) the 
event-driven CD training succeeded with a smaller number of 
data presentations (20, 000), which corresponded to 2000 s of 
simulated time. This suggests that the training durations are 
achievable for real-time neuromorphic systems. 

3.2. 1. The choice of the number of class neurons N c 

Event-driven CD underperformed in the case of 1 neuron per 
class label (N c = 10), which is the common choice for standard 
CD and Gibbs sampling. This is because a single neuron firing 
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FIGURE 8 | To test recognition accuracy, the trained RBMs are sampled 
using the l&F neuron-based sampler for up to 1 s. The classification is 
read out by identifying the group of class label neurons that had the highest 
activity. This experiment is run for RBM parameter sets obtained by standard 
CD (black, CD) and event-driven CD (green, eCD). To test for robustness to 
finite precision weights, the RBM was run with parameters obtained by 



event-driven CD discretized to 8 and 5 bits. In all scenarios, the accuracy after 
50 ms of sampling was above 80% and after 1 s the accuracies typically 
reached their peak at around 92%. The dashed horizontal lines show the 
recognition accuracy obtained by minimizing the free-energy (see text). The 
fact that the eCD curve (solid green) surpasses its free-energy line suggests 
that a model that is tailored to the l&F spiking neural network was learned. 



at its maximum rate of 250 Hz cannot efficiently drive the rest of 
the network without tending to induce spike-to-spike correlations 
(e.g., synchrony), which is incompatible with the assumptions 
made for sampling with l&F neurons and event-driven CD. As 
a consequence, the generative properties of the neural RBM 
degrade. This problem is avoided by using several neurons per 
class label (in our case four neurons per class label) because the 
synaptic weight can be much lower to achieve the same effect, 
resulting in smaller spike-to-spike correlations. 

3.2.2. Neural parameters with finite precision 

In hardware systems, the parameters related to the weights and 
biases cannot be set with floating-point precision, as can be done 
in a digital computer. In current neuromorphic implementations 
the synaptic weights can be configured at precisions of about 
8 bits (Yu et al., 2012). We characterize the impact of finite- 
precision synaptic weights on performance by discretizing the 
weight and bias parameters to 8 bits and 5 bits. The set of possi- 
ble weights were spaced uniformly in the interval (jjl — 4.5a, |x + 
4.5a), where |A, a are the mean and the standard deviation of 
the parameters across the network, respectively. The classifica- 
tion performance of MNIST digits degraded gracefully. In the 8 
bit case, it degrades only slightly to 91.6%, but in the case of 5 



bits, it degrades more substantially to 89.4%. In both cases, the 
RBM still retains its discriminative power, which is encouraging 
for implementation in hardware neuromorphic systems. 

3.3. GENERATIVE PROPERTIES OF THE RBM 

We test the neural RBM as a generative model of the MNIST 
dataset of handwritten digits, using parameters obtained by run- 
ning the event-driven CD. The RBM's generative property enables 
it to classify and generate digits, as well as to infer digits by com- 
bining partial evidence. These features are clearly illustrated in 
the following experiment (Figure 9). First the digit 3 is presented 
(i.e., layer vd is driven by layer d) and the correct class label in 
v c activated. Second, the neurons associated to class label 5 are 
clamped, and the network generated its learned version of the 
digit. Third, the right-half part of a digit 8 is presented, and the 
class neurons are stimulated such that only 3 or 6 are able to acti- 
vate (the other class neurons are inhibited, indicated by the gray 
shading). Because the stimulus is inconsistent with 6, the network 
settled to 3 and reconstructed the left part of the digit. 

The latter part of the experiment illustrates the integration of 
information between several partially specified cues, which is of 
interest for solving sensorimotor transformation or multi-modal 
sensory cue integration problems (Deneve et al., 2001; Doya 
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FIGURE 9 I The recurrent structure of the network allows it to classify, 
reconstruct and infer from partial evidence. (A) Raster plot of an 
experiment illustrating these features. Before time 0s, the neural RBM runs 
freely, with no input. Due to the stochasticity in the network, the activity 
wanders from attractor to attractor. At time 0s, the digit 3 is presented (i.e., 
layer v d is driven by d), activating the correct class label in v c ; At time 
f = 0.3 s, the class neurons associated to 5 are clamped to high activity and 
the rest of the class label neurons are strongly inhibited, driving the network 
to reconstruct its version of the digit in layer ty; At time f = 0.6 s, the 



right-half part of a digit 8 is presented, and the class neurons are stimulated 
such that only 3 or 6 can activate (all others are strongly inhibited as indicated 
by the gray shading). Because the stimulus is inconsistent with 6, the 
network settles to a 3 and attempts to reconstruct it. The top figures show 
the digits reconstructed in layer ty. (B) Digits 0-9, reconstructed in the same 
manner. The columns correspond to clamping digits 0-9, and each is 
different, independent run. (C) Population firing rate of the experiment 
presented in (A). The network activity is typically at equilibrium after about 
10x r = 40 ms (black bar). 



et al., 2006; Corneil et al., 2012). This feature has been used for 
auditory- visual sensory fusion in a spiking Deep Belief Network 
(DBN) model (O'Connor et al., 2013). There, the authors trained 
a DBN with visual and auditory data, which learned to asso- 
ciate the two sensory modalities, very similarly to how class 
labels and visual data are associated in our architecture. Their 
network was able to resolve a similar ambiguity as in our exper- 
iment in Figure 9, but using auditory inputs instead of a class 
label. 

During digit generation, the trained network had a tendency 
to be globally bistable, whereby the layer ty completely deacti- 
vated layer h. Since all the interactions between and v c take 
place through the hidden layer, v c could not reconstruct the 
digit. To avoid this, we added populations of I&F neurons that 
were wired to layers v and h, respectively. The parameters of 
these neurons and their couplings were tuned such that each 
layer was strongly excited when it's average firing rate fell below 
5 Hz. 

4. DISCUSSION 

Neuromorphic systems are promising alternatives for large- 
scale implementations of RBMs and deep networks, but the 
common procedure used to train such networks, Contrastive 
Divergence (CD), involves iterative, discrete-time updates that do 
not straightforwardly map on a neural substrate. We solve this 
problem in the context of the RBM with a spiking neural network 
model that uses the recurrent network dynamics to compute these 
updates in a continuous-time fashion. We argue that the recurrent 
activity coupled with STDP dynamics implements an event- 
driven variant of CD. Event-driven CD enables the system to learn 



on-line, while being able to carry out functionally relevant tasks 
such as recognition, data generation and cue integration. 

The CD algorithm can be used to learn the parameters of 
probability distributions other than the Boltzmann distribution 
(even those without any symmetry assumptions). Our choice for 
the RBM, whose underlying probability distribution is a special 
case of the Boltzmann distribution, is motivated by the following 
facts: They are universal approximators of discrete distributions 
(Le Roux and Bengio, 2008); the conditions under which a spik- 
ing neural circuit can naturally perform MCMC sampling of a 
Boltzmann distribution were previously studied (Merolla et al., 
2010; Buesing et al, 201 1); and RBMs form the building blocks of 
many deep learning models such as DBNs, which achieve state- 
of-the-art performance in many machine learning tasks (Bengio, 
2009). The ability to implement RBMs with spiking neurons and 
train then using event-based CD paves the way toward on-line 
training of DBNs of spiking neurons (Hinton et al., 2006). 

We chose the MNIST handwritten digit task as a benchmark 
for testing our model. When the RBM was trained with standard 
CD, it could recognize up to 926 out of 1000 of out-of-training 
samples. The MNIST handwritten digit recognition task was pre- 
viously shown in a digital neuromorphic chip (Arthur et al., 
2012), which performed at 89% accuracy, and in a software sim- 
ulated visual cortex model (Eliasmith et al, 2012). However, 
both implementations were configured using weights trained off- 
line. A recent article showed the mapping of off-line trained 
DBNs onto spiking neural network (O'Connor et al, 2013). Their 
results demonstrated hand-written digit recognition using neu- 
romorphic event-based sensors as a source of input spikes. Their 
performance reached up to 94.1% using leaky I&F neurons. The 
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use of an additional layer explains to a large extent their bet- 
ter performance compared to ours (91.9%). Our work extends 
(O'Connor et al, 2013) with on-line training that is based on 
synaptic plasticity, testing its robustness to finite weight preci- 
sion, and providing an interpretation of spiking activity in terms 
of neural sampling. 

To achieve the computations necessary for sampling from the 
RBM, we have used a neural sampling framework (Fiser et al., 
2010), where each spike is interpreted as a sample of an under- 
lying probability distribution. Buesing et al. proved that abstract 
neuron models consistent with the behavior of biological spik- 
ing neurons can perform MCMC, and have applied it to a basic 
learning task in a fully visible Boltzmann Machine. We extended 
the neural sampling framework in three ways: First, we identified 
the conditions under which a dynamical system consisting of I&F 
neurons can perform neural sampling; Second, we verified that 
the sampling of RBMs was robust to finite-precision parameters; 
Third, we demonstrated learning in a Boltzmann Machine with 
hidden units using STDP synapses. 

In neural sampling, neurons behave stochastically. This behav- 
ior can be achieved in I&F neurons using noisy input currents, 
created by a Poisson spike train. Spike trains with Poisson-like 
statistics can be generated with no additional source of noise, for 
example by the following mechanisms: balanced excitatory and 
inhibitory connections (van Vreeswijk and Sompolinsky, 1996), 
finite-size effects in a large network, and neural mismatch (Amit 
and Brunei, 1997). The latter mechanism is particularly appeal- 
ing, because it benefits from fabrication mismatch and operating 
noise inherent to neuromorphic implementations (Chicca and 
Fusi, 2001). 

Other groups have also proposed to use I&F neuron mod- 
els for computing the Boltzmann distribution. (Merolla et al., 
2010) have shown that noisy I&F neurons' activation function is 
approximately a sigmoid as required by the Boltzmann machine, 
and have devised a scheme whereby a global inhibitory rhythm 
drives the network to generate samples of the Boltzmann distri- 
bution. O'Connor et al. (2013) have demonstrated a deep belief 
network of I&F neurons that was trained off-line, using standard 
CD and tested it using the MNIST database. Independently and 
simultaneously to this work, Petrovici et al. (2013) demonstrated 
that conductance-based I&F neurons in a noisy environment are 
compatible with neural sampling as described in Buesing et al. 
(2011). Similarly, Petrovici et al. find that the choice of non- 
rectangular PSPs and the approximations made by the I&F neu- 
rons are not critical to the performance of the neural sampler. Our 
work extends all of those above by providing an online, STDP- 
based learning rule to train RBMs sampled using I&F neurons. 

4.1. APPLICABILITY TO NEUROMORPHIC HARDWARE 

Neuromorphic systems are sensible to fabrication mismatch 
and operating noise. Fortunately, the mismatch in the synaptic 
weights and the activation function parameters y and p" are not 
an issue if the biases and the weights are learned, and the func- 
tionality of the RBM is robust to small variations in the weights 
caused by discretization. These two findings are encouraging for 
neuromorphic implementations of RBMs. However, at least two 
conceptual problems of the presented RBM architecture must be 
solved in order to implement such systems on a larger scale. First, 



the symmetry condition required by the RBM does not necessar- 
ily hold. In a neuromorphic device, the symmetry condition is 
impossible to guarantee if the synapse weights are stored locally 
at each neuron. Sharing one synapse circuit per pair of neurons 
can solve this problem. This may be impractical due to the very 
large number of synapse circuits in the network, but may be 
less problematic when using Resistive Random-Access Memorys 
(RRAMs) (also called memristors) crossbar arrays to emulate 
synapses (Kuzum et al., 2011; Cruz-Albrecht et al., 2013; Serrano- 
Gotarredona et al., 2013). RRAM are a new class of nanoscale 
devices whose current-voltage relationship depends on the his- 
tory of other electrical quantities (Strukov et al., 2008), and so 
act like programmable resistors. Because they can conduct cur- 
rents in both directions, one RRAM circuit can be shared between 
a pair of neurons. A second problem is the number of recur- 
rent connections. Even our RBM of modest dimensions involved 
almost two million synapses, which is impractical in terms of 
bandwidth and weight storage. Even if a very high number of 
weights are zero, the connections between each pair of neurons 
must exist in order for a synapse to learn such weights. One pos- 
sible solution is to impose sparse connectivity between the layers 
(Murray and Kreutz-Delgado, 2007; Tang and Eliasmith, 2010) 
and implement synaptic connectivity in a scalable hierarchical 
address-event routing architecture (loshi et al, 2010; Park et al., 
2012). 

4.2. OUTLOOK: A CUSTOM LEARNING RULE 

Our method combines I&F neurons that perform neural sam- 
pling and the CD rule. Although we showed that this leads to a 
functional model, we do not know whether event-driven CD is 
optimal in any sense. This is partly due to the fact that CDjt is an 
approximate rule (Hinton, 2002), and it is still not entirely under- 
stood why it performs so well, despite extensive work in studying 
its convergence properties (Carreira-Perpinan and Hinton, 2005). 
Furthermore, the distribution sampled by the I&F neuron does 
not exactly correspond to the Boltzmann distribution, and the 
average weight updates in event-driven CD differ from those of 
standard CD, because in the latter they are carried out at the end 
of the reconstruction step. 

A very attractive alternative is to derive a custom synap- 
tic plasticity rule that minimizes some functionally relevant 
quantity (such as Kullback-Leibler divergence or Contrastive 
Divergence), given the encoding of the information in the I&F 
neuron (Deneve, 2008; Brea et al., 2013). A similar idea was 
recently pursued in Brea et al. (2013), where the authors derived 
a triplet-based synaptic learning rule that minimizes an upper 
bound of the Kullback-Leibler divergence between the model 
and the data distributions. Interestingly, their rule had a similar 
global signal that modulates the learning rule, as in event-driven 
CD, although the nature of this resemblance remains to be 
explored. Such custom learning rules can be very beneficial in 
guiding the design of on-chip plasticity in neuromorphic VLSI 
and RRAM nanotechnologies, and will be the focus of future 
research. 
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